Entropy and Speech
نویسنده
چکیده
In this thesis, we study the representation of speech signals and the estimation of information-theoretical measures from observations containing features of the speech signal. The main body of the thesis consists of four research papers. Paper A presents a compact representation of the speech signal that facilitates perfect reconstruction. The representation is constituted of models, model parameters, and signal coefficients. A difference compared to existing speech representations is that we seek a compact representation by adapting the models to maximally concentrate the energy of the signal coefficients according to a selected energy concentration criterion. The individual parts of the representation are closely related to speech signal properties such as spectral envelope, pitch, and voiced/unvoiced signal coefficients, beneficial for both speech coding and modification. From the information-theoretical measure of entropy, performance limits in coding and classification can be derived. Papers B and C discuss the estimation of differential entropy. Paper B describes a method for estimation of the differential entropies in the case when the set of vector observations (from the representation) lie on a lower-dimensional surface (manifold) in the embedding space. In contrast to the method presented in Paper B, Paper C introduces a method where the manifold structures are destroyed by constraining the resolution of the observation space. This facilitates the estimation of bounds on classification error rates even when the manifolds are of varying dimensionality within the embedding space. Finally, Paper D investigates the amount of shared information between spectral features of narrow-band (0.3-3.4 kHz) and high-band (3.4-8 kHz) speech. The results in Paper D indicate that the information shared between the high-band and the narrow-band is insufficient for high-quality wideband speech coding (0.3-8 kHz) without transmission of extra information describing the high-band.
منابع مشابه
Spectral Entropy as Speech Features for Speech Recognition
This paper presents an investigation of spectral entropy features, used for voice activity detection, in the context of speech recognition. The entropy is a measure of disorganization and it can be used to measure the peakiness of a distribution. We compute the entropy features from the short-time Fourier transform spectrum, normalized as a PMF. The concept of entropy shows that the voiced regi...
متن کاملWeighted Entropy Cortical Algorithms for Modern Standard Arabic Speech Recognition
Cortical algorithms (CA) inspired by and modeled after the human cortex, have shown superior accuracy in few machine learning applications. However, CA have not been extensively implemented for speech recognition applications, in particular the Arabic language. Motivated to apply CA to Arabic speech recognition, we present in this paper an improved CA that is efficiently trained using an entrop...
متن کاملCorpus Based Evaluation of Entropy Rate Speech Segmentation
The sequence of estimates of the speech signal’s entropy rate is investigated as a potential basis for speech segmentation. Raising and falling edges of that entropy rate curve and its maxima and minima are considered as candidates for segment boundaries. These prominent points are compared to the phonetic segment boundaries and to acoustic landmarks. The comparison is made using the American T...
متن کاملEntropy Rate-based Stationary / Non-stationary Segmentation of Speech
This study evaluates the potential of the entropy rate contour to identify stationary and non-stationary segments of speech signals. The segmentation produced by an entropy rate-based method is compared to the manual phoneme segmentations of the TIMIT and the KIEL corpora. Characteristic points, i.e. steepest rises and falls of the entropy rate curve and its maxima and minima are investigated t...
متن کاملA Maximum Entropy Method for Language Modelling
The language models used for automatic speech recognition (ASR) are often based on very simple Markov models. This paper presents an overview of a more powerful modelling technique, Maximum Entropy (ME), and its application in langauge modelling. Preliminary results indicate that ME models are viable for this task, and perform slightly better than the traditional models.
متن کاملSpeech/music segmentation using entropy and dynamism features in a HMM classification framework
In this paper, we present a new approach towards high performance speech/music discrimination on realistic tasks related to the automatic transcription of broadcast news. In the approach presented here, an artificial neural network (ANN) trained on clean speech only (as used in a standard large vocabulary speech recognition system) is used as a channel model at the output of which the entropy a...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2006